This report contains the code used to produce the panels of figure 6 of the paper including the re-analysis of the for SRRM2 Knockdown (KD) RNA-seq in HeLa(Zhang et al. 2024) (Bo Wang lab) and HepG2(Cui et al. 2023) (Ben Blewncowe, Alan Lambowitz & Paul Schimmel) cells.
2 Set Up
Last code execution: 2025 March 04, Tuesday @ 10:03:24.
2.1 Packages
Load packages required for the analysis and suppress any message. Check the Section 8 section at the end for package versions.
In addition I developed an R package called niar to stream line some of the common analysis steps. It can be installed from my GitHub repository using:
Set the ∆PSI threshold, the ∆PSI “squish” threshold in the correlation scatter plots.
Code
dPSI_squish <-40dPSI_thshld <-15
Note
In some plots the points higher than |∆PSI| > 40 are squished and plotted at that value. This helps showing all points on the plot and keeps the data less dispersed.
Here I report the analysis code to reproduce the findings in figure 6.
3 Number of DSE
To display the number of differentially spliced events (DSE) I read the files I wrote in the previous analysis reports, integrating 3 different methods, and plot them as bar plots. This file contains all the shared and uniquely differentially spliced events between TAF2 and NLS-∆IDR-TAF2.
Code
DSE_TAF2 <-read_delim(file = dse_path_TAF2, delim ='\t', col_names = T, show_col_types = F)num_DSE_TAF2 <- DSE_TAF2 |>select(EXPERIMENT, EVENT, AS_TYPE, DIRECTION) |>unique() |>group_by(EXPERIMENT, AS_TYPE, DIRECTION) |>summarise(Num_DSE =n(), .groups ='keep') |>mutate(AS_TYPE =str_replace(pattern ='Acceptor', replacement ="3' ss", AS_TYPE)) |>mutate(AS_TYPE =str_replace(pattern ='Donor', replacement ="5' ss", AS_TYPE)) |>mutate(CELL_TYPE ='HeLa')# add fake data so that the bars in the plot have all the same widthfiller_data <-data.frame(EXPERIMENT =rep(c('TAF2', 'TAF2∆IDR'), 2), AS_TYPE =c( rep("Alt. 3' ss", 2), rep("Alt. 5' ss", 2) ), CELL_TYPE ='HeLa',DIRECTION ='DOWN', Num_DSE =-1)num_DSE_TAF2 <-rbind(num_DSE_TAF2, filler_data)
Import SRRM2 Knockdown events in HeLa cells.
Code
DSE_SRRM2 <-read_delim(file = dse_path_SRRM2_HeLa,delim ='\t', col_names = T, show_col_types = F)num_DSE_SRRM2_HeLa <- DSE_SRRM2 |>select(EXPERIMENT, EVENT, AS_TYPE, DIRECTION) |>unique() |>group_by(EXPERIMENT, AS_TYPE, DIRECTION) |>summarise(Num_DSE =n(), .groups ='keep') |>mutate(EXPERIMENT =str_remove(pattern ="^HeLa-", EXPERIMENT)) |>mutate(AS_TYPE =str_replace(pattern ='Acceptor', replacement ="3' ss", AS_TYPE)) |>mutate(AS_TYPE =str_replace(pattern ='Donor', replacement ="5' ss", AS_TYPE)) |>mutate(CELL_TYPE ='HeLa')# add fake data so that the bars in the plot have all the same widthfiller_data <-data.frame(EXPERIMENT =rep(c('SRRM2-KD'), 2), AS_TYPE =c( rep("Alt. 3' ss", 2), rep("Alt. 5' ss", 2) ), CELL_TYPE ='HeLa',DIRECTION ='DOWN', Num_DSE =-1)num_DSE_SRRM2_HeLa <-rbind(num_DSE_SRRM2_HeLa, filler_data)
For completeness I also import the differentially spliced events upon SRRM2 KD in HepG2.
Code
DSE_SRRM2_HepG2 <-read_delim(file = dse_path_SRRM2_HepG2,delim ='\t', col_names = T, show_col_types = F)num_DSE_SRRM2_HepG2 <- DSE_SRRM2_HepG2 |>select(EXPERIMENT, EVENT, AS_TYPE, DIRECTION) |>unique() |>group_by(EXPERIMENT, AS_TYPE, DIRECTION) |>summarise(Num_DSE =n(), .groups ='keep') |>mutate(EXPERIMENT =str_remove(pattern ="^HepG2-", EXPERIMENT)) |>mutate(AS_TYPE =str_replace(pattern ='Acceptor', replacement ="3' ss", AS_TYPE)) |>mutate(AS_TYPE =str_replace(pattern ='Donor', replacement ="5' ss", AS_TYPE)) |>mutate(CELL_TYPE ='HepG2')# add fake data so that the bars in the plot have all the same widthfiller_data <-data.frame(EXPERIMENT =rep(c('SRRM2-KD'), 2), AS_TYPE =c( rep("Alt. 3' ss", 2), rep("Alt. 5' ss", 2) ), CELL_TYPE ='HepG2',DIRECTION ='DOWN', Num_DSE =-1)num_DSE_SRRM2_HepG2 <-rbind(num_DSE_SRRM2_HepG2, filler_data)
Submiting results from GOfuncR to rrvgo...
rrvgo is now slimming BP GO terms from GOfuncR
There are 64 clusters in your GO BP terms from GOfuncR
rrvgo is now slimming CC GO terms from GOfuncR
There are 18 clusters in your GO CC terms from GOfuncR
rrvgo is now slimming MF GO terms from GOfuncR
There are 29 clusters in your GO MF terms from GOfuncR
Submiting results from GOfuncR to rrvgo...
rrvgo is now slimming BP GO terms from GOfuncR
There are 49 clusters in your GO BP terms from GOfuncR
rrvgo is now slimming CC GO terms from GOfuncR
There are 8 clusters in your GO CC terms from GOfuncR
rrvgo is now slimming MF GO terms from GOfuncR
There are 26 clusters in your GO MF terms from GOfuncR
Submiting results from GOfuncR to rrvgo...
rrvgo is now slimming BP GO terms from GOfuncR
There are 50 clusters in your GO BP terms from GOfuncR
rrvgo is now slimming CC GO terms from GOfuncR
There are 16 clusters in your GO CC terms from GOfuncR
rrvgo is now slimming MF GO terms from GOfuncR
There are 26 clusters in your GO MF terms from GOfuncR
Submiting results from GOfuncR to rrvgo...
rrvgo is now slimming BP GO terms from GOfuncR
There are 57 clusters in your GO BP terms from GOfuncR
rrvgo is now slimming CC GO terms from GOfuncR
There are 14 clusters in your GO CC terms from GOfuncR
rrvgo is now slimming MF GO terms from GOfuncR
There are 28 clusters in your GO MF terms from GOfuncR
Submiting results from GOfuncR to rrvgo...
rrvgo is now slimming BP GO terms from GOfuncR
There are 40 clusters in your GO BP terms from GOfuncR
rrvgo is now slimming CC GO terms from GOfuncR
There are 12 clusters in your GO CC terms from GOfuncR
rrvgo is now slimming MF GO terms from GOfuncR
There are 17 clusters in your GO MF terms from GOfuncR
Submiting results from GOfuncR to rrvgo...
rrvgo is now slimming BP GO terms from GOfuncR
There are 70 clusters in your GO BP terms from GOfuncR
rrvgo is now slimming CC GO terms from GOfuncR
There are 26 clusters in your GO CC terms from GOfuncR
rrvgo is now slimming MF GO terms from GOfuncR
There are 36 clusters in your GO MF terms from GOfuncR
Select columns and prepare the dataset to run a non-parametric Wilcoxon test for each exon feature. It compares the VALUE (feature numeric values) between groups in GROUP2 (type of feature) using 'AS_NC' (alternatively spliced non changing exons) as the reference, then adjusts the P values with the BH (Benjamini–Hochberg) method and adds significance labels to the results.
Reshape test results dataframe for plot. Remove non significant (ns) features.
Code
tidy_res_test_EF <-subset(res_test_EF, p.adj.signif !='ns') |>select(EX_FEAT, group2, p, p.adj, p.adj.signif) |>process_exon_feats_test() |>subset( !CATEGORY %in%c('Ratio to Exon Length', 'Ratio to Exon GC %') ) message('Number of significant features: ', nrow(tidy_res_test_EF) )
Number of significant features: 16
7.3 Plot Exon Features Heatmap
Here facet_row splits the plot into multiple rows, with each row representing a different level of the CATEGORY variable. It also allows each row to have its own x-axis scale and custom labels (via labeller) for clearer facet titles.
Code
ggplot(cmbnd_EF_cat) +aes(x = LOCATION, y = GROUP2) +facet_row( ~CATEGORY, scales ='free_x', space ="free",labeller =labeller(CATEGORY =c("S.S. Score"="Max Entropy Score","Poly Y Track"="PYT offset\nfrom 3'ss","Branch Point"="Branch Point\nScore","Rel. Position"="Rel.\nPosition") ) ) +geom_tile(aes(fill = mean_zF_ASNC), colour ='black') +# Add significance annotations from tidy_res_test_EF:geom_text(data = tidy_res_test_EF, inherit.aes = F, mapping =aes(x = LOCATION, y = group2, label = p.adj.signif),size =2, color ="black", vjust =0.75) +scale_fill_gradient2(low ='dodgerblue3', mid ='white', high ='firebrick3', midpoint =0, name ='Feature Mean Z-Score', na.value ='gray') +scale_x_discrete(expand =expansion(mult =0, add =0)) +scale_y_discrete(expand =expansion(mult =0, add =0)) +coord_cartesian(clip ='off') +guides(fill =guide_colourbar(barheight =unit(2.5, units ="mm"),barwidth =unit(8, units ="cm"),title.vjust =1 ) ) +theme(axis.text =element_text(colour ='black', family ='Arial'),axis.text.x =element_text(angle =45, hjust =1),axis.title =element_blank(),axis.ticks =element_blank(),strip.background =element_blank(),panel.background =element_blank(),panel.border =element_blank(),panel.grid =element_blank(),plot.background =element_blank(),legend.position ='bottom') -> p_heatmapp_heatmap
This document is generated using Quarto which enables to weave together content and executable code into a finished document. This is an improved version of what used to be called ‘Rmarkdown’. The document hides all code chunks, but they can be opened up with the drop down arrow. On the top right corner there’s a toggle for dark-mode. To learn more about Quarto see here.
References
Cui, Haissi, Jolene K. Diedrich, Douglas C. Wu, Justin J. Lim, Ryan M. Nottingham, James J. Moresco, John R. Yates, Benjamin J. Blencowe, Alan M. Lambowitz, and Paul Schimmel. 2023. “Arg-tRNA synthetase links inflammatory metabolism to RNA splicing and nuclear trafficking via SRRM2.”Nature Cell Biology 25 (4): 592–603. https://doi.org/10.1038/s41556-023-01118-8.